o 


Technical  Information  Service 


PLEASE  RETURN  TO: 

BMO  TECHNICAL  INFORMATION  CENTER 
BALLISTIC  MISSILE  DEFENSE  ORGANIZATION 
7100  DEFENSE  PENTAGON 
WASHINGTON  D.C.  20301-7100 


UoxZdl 

AMERICAN  INSTITUTE  OF  AERONAUTICS  AND  ASTRONAUTICS^ 

555  WEST  57th  STREET  NEW  YORK,  N.Y.  10019  212/247-6500 


Accession  Number:  2201 


Title:  Assessment  of  the  Development  of  a  Tracking  System  Using  Concurrent  Ada  (paper) 

Personal  Author:  Lemanski,  WJ.;  Hartrum,  T.C. 

Corporate  Author  Or  Publisher:  AIAA 

Descriptors,  Keywords:  Methodology  Development  Communication  Tracking  Algorithm  Cyclic  Integration  Parameter  Allocation 
Ada  Software  Assessment  Kalman  Filter  Code 

Pages:  008 

Cataloged  Date:  Jun  18,  1990 
Document  Type:  HC 
Number  of  Copies  In  Library:  000001 
Original  Source  Number:  A90-30722 
Record  ID:  21120 


Source  of  Document:  AIAA 


I 


A?jp.rovsd  for  nn> 


public  release; 


AN  ASSESSMENT  OP  THE  DEVELOPMENT 
tracking  system  USITC  concurrent  ADA 

School  of  Engineering 

Air  Force  Institute  of  Technology 

Wnght-Patterson  APB,  Dayton,  Ohiof  45433  > 


Abstract 

<la™nd,"i'’SSx'reS-  T™'  “• 

of  .he  copaNmi* 

chtTob  "."Si STn  “t 

on  .  «  processor  Etj"'k“L"“  ■»  Ad. 

speed-un  resi]h<?  ani^  *  i  j  discusses 

otegy  used Th!  .d  on  Ihe  method- 


1  Introduction 

Although  successful  realization  of  the  goals  of  the  Strategic 
Defense  Initiative  will  require  significant  advances  in  many 
areM  of  science  and  engineering,  it  is  now  generally  ac¬ 
cepted  that  computers  and  their  associated  software  have 
become  the  “long  pole  in  the  tent.”  SDI  software  will  be 
the  most  complex  ever  developed.  The  operational  system 
wil  have  to  run  in  real-time  and  be  extremely  reliable.  In 

addition,  extensive  ground-based  simulators  will  be  neces- 
sary. 

Parallel  processing  shows  great  promise  not  only  for 
computationally  intensive  real-time  deadlines 
of  SDI  software,  but  also  for  substituting  processing  power 
for  software  complexity  by  allowing  the  use  of  simple  com- 
pute-intensive  algorithms  rather  than  complex  optimized 
parallel  processors  can  also  increase  the 
reliability  of  the  system  by  distributing  the  computing  load 
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XtTT  Finally,  parallel  systems  pro- 

p"  till  oo  --nts  of  cZ 
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fectilell'^h  Fn'^er  of  parallel  processing  ef¬ 

fectively,  It  IS  necessary  to  have  a  software  programming 

ThelTirnlf  algorithms 

alhd  '"as  specifically  designed  for  use  in  par- 

allel  real-time  software  development.  It  explicitly  supports 
ndependent  communicating  processes  through  the  lie  of 

°Ferations  with  rirJrntation 
pecifications.  In  addition,  Ada  is  a  third  generation  high 
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The  purpose  of  this  research  was  to  explore  the  possi- 
I  les  presented  by  parallel  processing  and  the  Ada  pro 

StX'd®  determine  how  they  could  ben- 

evelopment  of  SDI  software  systems.  To  do  this, 

-a  non-trivial  application  was  selected  to  be  designed  and’ 
implemented  as  a  parallel  Ada  system.  The  Air  Force  is 
developing  a  tracking  system  which  uses  a  forward  look- 

T”?"  ^  tracking  algorithm  to 

P  1  e  arget  information  to  a  laser  pointing  system.  The 
purpose  of  the  tracking  algorithm  is  to  detect  vehicle  move¬ 
ment  in  order  to  keep  the  target  centered  in  the  tracker’s 
held  of  view,  simultaneously  pointing  the  laser. 

+1,  ^  tracking  algorithm  is  being  developed  at 

he  Air  Force  Institute  of  Technology  (AFIT)  based  on 
Kalman  filtering  techniques  [8,9,10,14].  The  heart  of  this 
tracking  system  is  a  Kalman  filter  algorithm  which  pro¬ 
cesses  inputs  from  measurement  devices  together  with  the 
-now  edge  of  applicable  device  dynamics,  statistical  de¬ 
scriptions  of  noises,  measurement  errors,  modeling  uncer- 
ainties,  and  initial  conditions,  to  produce  an  optimal  esti¬ 
mate  the  current  and  predicted  future  position  of  the  tar- 
ge  .  The  tracker  is  made  even  more  accurate  through  the 

ir  adaptive  filter  system  (MMAF). 

le  MMAF  IS  composed  of  a  group  of  Kalman  filters  con¬ 
structed  with  varying  target  characteristic  specifications. 

he  outputs  from  the  filters  are  arbitrated  by  givine  varv- 
mg  weight  to  each  filter,  based  upon  its  predictive  ac- 
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.  curacy.  The  result  is  an  estimate  more  accurate  than  a 
single  filter  could  provide.  However,  this  increased  accu¬ 
racy  comes  at  a  cost  of  increased  computational  loading. 
The  size  and  complexity  of  the  algorithm  makes  it  a  good 
test  case  for  parallelization.  In  addition,  it  is  an  excellent 
example  of  the  type  of  ground-based  simulation  SDI  will 
require. 

The  system  was  implemented  on  an  Encore  Multimax 
320,  a  tightly- coupled  multiprocessor  architecture  based 
on  the  National  Semiconductor  32332  processor.  The  Mul¬ 
timax  320  at  AFIT  is  configured  with  16  processors  and 
32  Mbytes  of  main  memory.  The  system  operates  under 
a  single  copy  of  UMAX  (which  is  a  derivative  of  UNIX) 

which  is  accessible  by  all  of  the  processors  simultaneously. 
UMAX  implements  the  concept  of  multi-threading,  allow¬ 
ing  it  to  support  multiple,  simultaneous  streams  of  control. 
Interprocess  communication  occurs  through  shared  mem¬ 
ory  The  operating  system  also  supports  process  migration, 
making  dynamic  load  balancing  possible  [4]. 

The  Encore  Ada  run-time  environment  makes  true  con¬ 
current  Ada  possible.  Rather  than  using  some  single¬ 
processor  interleaving  scheme,  Encore  Concurrent  Ada  al¬ 
lows  the  user  to  specify  the  number  of  independent  pro¬ 
cesses  desired.  Tasks  are  then  scheduled  on  a  first-come 
first- served  basis  from  a  single  queue  when  a  process  be¬ 
comes  available.  Tasks  priorities  are  supported  through 
the  use  of  multiple  task  queues,  one  for  each  priority  level. 
The  user  has  no  explicit  control  over  task  allocation  or 
scheduling.  The  processes  are  assigned  to  processors  by 
the  operating  system.  Tasks  are  assigned  to  available  pro¬ 
cesses  by  the  Ada  run-time  system  [5]. 

The  ability  to  do  true  concurrent  programming  in  the 
Ada  language  is  relatively  new.  Many  software  engineering 
issues  in  this  area  still  need  to  be  resolved.  The  purpose  of 
this  paper  is  to  discuss  how  we  resolved  these  issues  in  the 
context  of  a  functional  concurrent  Ada  implementation. 
The  issues  discussed  include:  language  partitioning,  multi¬ 
tasking  design  requirements,  problem  decomposition,  and 
a  parallel  Ada  design  method.  A  description  of  the  im¬ 
plementation  and  testing  methods  used  is  also  included. 
The  paper  concludes  with  a  discussion  of  project  results 
including  an  analysis  of  both  the  design  methodology  and 
the  adequacy  of  the  Ada  language  for  parallel  software  de¬ 
velopment. 

2  System  Analysis  and  Design 

2.1  Language  Partitioning  Strategy 

Despite  the  fact  that  the  task  construct  was  specifically  in¬ 
cluded  in  the  language  to  support  parallel  processing,  there 
is  some  disagreement  as  to  the  exact  time  and  method 
that  should  be  used  for  partitioning  Ada  programs.  There 
are  two  times  in  the  development  process  when  software 
can  be  partitioned.  The  two  corresponding  strategies  are 


known  as  pre-partitioning  and  post-partitioning  [2lj.  In 
post-partitioning,  after  the  program  is  designed  and  writ¬ 
ten  as  if  for  a  single  processor,  the  desired  partitioning  is 
specified  using  separate  software  tools,  or  is  accomplished 
automatically  by  the  distributed  operating  system.  Pre¬ 
partitioning  begins  at  the  very  start  of  the  design  process. 
A  particular  construct  of  the  programming  language  is  se¬ 
lected  as  the  basis  of  parallelization  and  used  to  encapsu¬ 
late  each  of  the  system  parts  that  will  run  in  parallel. 

One  advantage  claimed  for  the  post-partitioning  strat¬ 
egy  is  that  it  promotes  portability  by  allowing  the  same 
program  to  be  mapped  onto  different  hardware  configu¬ 
rations.  Another  is  that,  because  the  program  is  written 
to  run  sequentially,  there  are  no  restrictions  on  how  the 
language  is  used.  Finally,  it  avoids  concern  that  the  Ada 
language  does  not  contain  facilities  for  specifying  the  con¬ 
figuration  of  the  software  over  the  underlying  hardware. 

Automated  post-partitioning  does  not  currently  exist, 
nor  is  one  likely  to  be  developed  in  the  near  future.  Man¬ 
ual  post-partitioning  is  possible  using  tools  such  as  Hon- 
e3nveirs  Ada  Program  Partitioning  Language  [2,3].  This 
language  allows  the  mapping  details  to  be  expressed  com¬ 
pletely  separate  from  the  program.  However,  the  efficiency 
of  the  tool  remains  to  be  demonstrated. 

Correct  use  of  the  Ada  language  in  the  pre-partitioning 
strategy  can  provide  the  advantages  claimed  for  post-par¬ 
titioning.  Operations  that  lead  to  hardware  dependency 
in  a  parallel  system,  such  as  task  synchronization  and  dis¬ 
patching,  storage  management,  and  exception  handling, 

'  s-re  handled  in  Ada  by  a  machine-specific  run-time  envi¬ 
ronment  and  not  compiler- generated  code  As  a  result,  con¬ 
current  software  written  in  Ada  can  be  made  portable  if 
properly  implemented. 

Concern  about  restrictions  on  language  use  in  this  strat¬ 
egy  are  also  unfounded.  Although  Ada  code  must  be; 
encapsulated  in  the  task  construct  to  run  concurrently, 
the  uniformity  of  the  language  reduces  the  impact  on  the 
programmer’s  productivity.  For  example,  task  entry  calls 
mimic  procedure  calls  and  can  be  used  in  the  same  man¬ 
ner  procedure  calls  would  be  used  in  a  sequential  program. 
Within  the  task,  all  operations  are  available  that  would  be 
available  in  a  corresponding  sequential  program.  There¬ 
fore,  the  use  of  the  task  construct  places  no  unrecisonable 
restrictions  on  the  programmer. 

The  lack  of  configuration  facilities  in  Ada  can  be  over¬ 
come  through  the  use  of  data  driven  design  [13].  Rather 
than  depending  on  manual  partitioning,  which  would  re¬ 
quire  configuration  facilities  to  schedule  and  control  in¬ 
dependent  tasks,  data  driven  design  depends  on  the  run¬ 
time  environment  scheduler  to  control  access  to  processors, 
based  on  which  tasks  are  ready  to  run  and  their  relative 
priorities.  Because  the  Ada  rendezvous  mechanism  blocks 
tasks  that  are  awaiting  data,  the  flow  of  data  through 
the  system  determines  which  tasks  are  ready  to  run.  By 
controlling  this  flow  of  data,  the  designer  can  control  the 
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scheduling  of  tasks  without  extra- language  configuration 
management  facilities. 

Data  driven  design  results  in  minimum  processor  idle 
time  because  ready  tasks  do  not  have  to  wait  for  the  start 
of  their  frame  time  as  in  the  case  of  the  cyclic  executive 
Instead,  they  are  scheduled  as  soon  as  a  processor  is  avail¬ 
able.  In  addition,  data  driven  designs  adapt  automatically 
to  changes  in  hardware  resources,  timing,  or  processing  re¬ 
quirements  because  scheduling  decisions  are  made  in  real¬ 
time  by  the  run-time  system  based  on  the  availability  of 
data  and  processing  resources.  This  method  also  elimi¬ 
nates  the  need  for  time  frames  and,  therefore,  the  possibil¬ 
ity  of  frame  overrun.  On  the  other  hand,  timing  problems 
are  still  possible  if  processing  requirements  outpace  avail¬ 
able  processing  resources.  Because  high  priority  tasks  are 
always  scheduled  first,  any  timing  problems  will  show  up 
in  low  priority  tasks  first.  The  result  is  a  throughput  prob¬ 
lem  in  the  overall  system.  Since  the  processor  idle  time  is 
already  at  a  minimum,  the  only  way  to  solve  this  problem 
IS  by  increasing  processor  resources  or  decreasing  compu- 
tational  demand. 

Pre-partitionmg  is  the  superior  strategy  for  parallel 
design  in  Ada  because  it  promotes  early  examination  of 
critical  design  issues  and  eliminates  the  overhead  of  sepa¬ 
rate  partitioning  specifications.  In  addition  careful  design 
can  yield  the  benefits  of  the  post-partitioning  strategy  in¬ 
cluding:  portable  software,  freedom  of  language  use,  and 
adequate  concurrent  process  control  using  the  concept  of 
concurrent  design. 

Given  the  pre-partitioning  strategy  and  the  Ada  lan¬ 
guage,  partitioning  can  occur  at  four  language  levels  [2,3]. 
Because  two  of  these,  partitioning  on  any  source  program 
construct  and  extending  Ada,  require  facilities  outside  of 
those  provided  by  the  Ada  language,  these  two  methods 
will  not  be  considered  here.  Of  the  remaining  two  meth¬ 
ods,  the  first  consists  of  writing  a  separate  program  for 
each  processor.  This  method  was  commonly  used  in  the 
past  because  of  a  lack  of  language  constructs  designed 
specifically  for  parallel  processing.  It  is  inefficient  and  has 
several  disadvantages.  First,  this  approach  ties  the  soft¬ 
ware  closely  to  the  underlying  hardware.  If  the  hardware  is 
later  changed,  the  software  must  often  be  redesigned  and 
rewritten.  Any  attempt  at  reallocating  functions  among 
processors  will  also  require  redesign. 

Another  problem  is  that  much  of  the  reliability  gained 

from  the  use  of  high  order  languages  is  lost.  In  Ada, 
semantic  rules  are  enforced  by  the  compiler  only  within 
a  program,  not  across  program  boundaries.  Compilers 
may  even  generate  different  internal  structures  for  logi¬ 
cally  identical  data  types  in  each  of  the  programs  leading 
to  hidden  problems  when  data  is  exchanged  between  them. 
Finally,  communications  capabilities  in  the  Ada  language 
are  strong  for  passing  data  within  programs,  but  the  only 
facility  available  for  communication  between  programs  is 
inefficient  and  difficult  I/O  transfers. 


The  other  possible  alternative  is  to  partition  on  task 
boundaries  by  encapsulating  the  code  representing  each 
problem  fragment  within  a  task.  The  entire  system  is  con¬ 
tained  within  a  single  program  with  the  task  acting  as 
the  basis  for  concurrency.  This  method  increases  software 
portability,  and  reallocation  of  functions  within  the  pro¬ 
gram  can  be  handled  by  the  run-time  environment  or  with 
only  localized  changes  to  the  affected  task.  The  fact  that 
the  whole  system  is  contained  within  a  program  allows  for 
the  full  range  of  semantic  checking  and  makes  Ada’s  exten¬ 
sive  communication  capabilities  available  to  the  program¬ 
mer.  These  advantages  make  the  use  of  the  task  construct 
the  superior  option. 

2.2  Multitasking  Design  Issues 

The  use  of  the  task  construct  as  the  basis  for  concurrent 
processing  introduces  some  new  design  issues.  It  is  gener¬ 
ally  accepted  that  strong  cohesion  and  loose  coupling  leads 
-to  more  structured  design,  easier  modification,  and  higher 
maintainability.  The  idea  of  cohesion  can  be  applied  to 
tasks,  but  the  coupling  concept  must  account  for  the  in¬ 
terdependencies  that  result  from  concurrent  operations.  In 
this  context,  coupling  can  be  considered  at  two  levels.  In 

the  first  case,  between  subprograms  and  tasks,  coupling 
can  be  evaluated  in  the  standard  way. 

The  differences  in  the  concept  of  coupling  during  con¬ 
current  operations  manifest  themselves  in  the  second  case, 
task  to  task  interaction.  This  is  called  concurrency  cou¬ 
pling.  Tasks  are  considered  tightly  coupled  if  one  calls  the 
other  s  entry  directly.  There  are  various  degrees  of  tightly 
coupled  tasks.  The  tightest  occurs  during  a  rendezvous 
where  in  out  or  both  in  and  out  parameters  are  included 
in  the  call.  In  this  case,  the  calling  task  requires  a  reply 
and  the  two  tasks  must  remain  synchronized  for  the  entire 
period  of  data  transformation.  A  lessor  amount  of  coupling 
occurs  when  only  in  or  out  parameters  are  in  the  call.  Here, 
the  two  tasks  are  only  synchronized  long  enough  for  data 
to  be  copied  from  one  to  the  other.  A  parameterless  call 
represents  the  least  amount  of  coupling.  Tight  coupling 
should  be  avoided  whenever  possible  because  periods  of 
synchronization  eliminate  independent  operation  and  thus 
reduce  the  efficiency  of  parallel  processing.  Loose  coupling 
IS  achieved  through  the  use  of  intermediary  tasks  between 
the  caller  and  called  task  pair.  These  tasks  perform  a 
buffenng  function  to  ensure  that  the  two  main  tasks  can 
continue  processing  unhindered  by  unnecessary  synchro- 
nization  time. 

Three  varieties  of  intermediary  tasks  are  identified  in 
[11].  A  intertask  is  a  server  only.  It  contains  an  entry 
to  accept  data  from  a  producer  and  an  entry  to  provide 
data  to  a  consumer  upon  request.  In  between  producer 
and  consumer  calls,  the  data  is  stored  internal  to  the  task 
A  transporter  is  strictly  an  active  task.  It  requests  data 
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via  an  entry  call  to  the  producer  ta^k  and  then  outputs 
the  data  via  an  entry  call  to  a  consumer.  A  relay  task 
is  a  combination  of  the  two.  It  waits  until  called  by  the 
producer  and  then  immediately  calls  the  consumer.  Inter¬ 
mediary  tasks  are  used  in  various  combinations  to  achieve 
the  desired  degree  of  coupling  between  producer  and  con¬ 
sumer  tasks.  They  are  also  useful  to  control  caller/called 
relationships,  and  should  be  tailored  directly  to  specific  de¬ 
sign  requirements.  A  good  concurrent  design  should  have 
a  balanced  use  of  intermediary  tasks  with  no  cyclic  de¬ 
pendencies  and  a  minimum  amount  of  busy  waiting.  It 
should  also  minimize  the  amount  of  processing  done  dur¬ 
ing  a  rendezvous  (statements  within  accept  blocks)  and 
ensure  appropriate  modes  are  used  for  entry  parameters. 

The  introduction  of  concurrency  into  the  system  and 
the  need  to  reduce  task  coupling  must  be  balanced  against 
the  overhead  of  the  resulting  tasks.  This  includes  task  acti¬ 
vation,  termination,  scheduling,  dispatching,  allocation  of 
task  control  blocks,  and,  when  necessary,  context  switch¬ 
ing,  exception  handling,  management  of  entry  queues,  and 
rendezvous.  The  amount  of  overhead  associated  with  each 
of  these  operations  is  dependent  on  the  specific  run-time 
system  implementation,  but  it  must  always  be  considered. 

2.3  Problem  Decomposition 

The  first  step  in  any  parallel  software  project  is  to  deter¬ 
mine  how  the  problem  can  be  decomposed  into  indepen¬ 
dent  processes.  This  section  describes  three  methods  of 
problem  deoamposition,  using  the  terminology  developed 
in  [12].  The  three  methods  are:  relaxation,  pipelining,  and 
partitioning. 

Relaxation  consists  of  dividing  processing  into  indepen¬ 
dent  functions,  each  of  which  operates  on  the  same  input 
data  and  performs  a  complete  function.  They  are  not  de¬ 
pendent  on  data  from  each  other,  and  no  synchronization 
is  required  between  them.  This  type  of  decomposition  is 
ideal  when  multiple  tasks  are  being  performed  at  the  same 
time,  a  common  occurrence  in  real-time  systems. 

•  Pipelining  consists  of  dividing  the  problem  into  func¬ 
tions  that  follow  each  other  sequentially,  each  performing 
its  portion  of  the  computation  on  data  from  the  one  before 
it.  Each  segment  must  produce  results  at  the  sarhe  rate 
or  the  system  will  bottleneck  at  the  slowest  process.  This 
works  well  for  problems  where  complex  operations  follow 
each  other  sequentially  over  repeating  data  inputs. 

Partitioning  is  the  third  method  of  decomposition.  In¬ 
stead  of  each  processor  performing  different  computations, 
groups  of  processors  work  simultaneously  on  subparts  of 
the  same  problem.  This  method  differs  from  the  other  two 
in  that  partitioning  is  done  by  dividing  the  problem  data 
domain  rather  than  its  algorithmic  functions.  The  result  of 
the  divided  data  domain  is  called  the  grain,  and  grain  size 
has  an  important  effect  on  the  efficiency  of  the  resulting 
code.  Partitioning  is  most  effective  for  homogeneous  op¬ 


erations  performed  on  large  data  sets,  typically  identified 
through  iterative  structures.  The  goal  is  to  find  the  par¬ 
ticular  loop  which  constitutes  the  greatest  computational 
loading  and  divide  it  across  the  available  processors. 

There  are  three  major  limiting  factors  that  impact  the 
decomposition  of  a  problem  and  the  efficiency  of  parallel 
software.  The  first  is  the  computational  overhead  added 
by  parallel  software.  This  overhead  comes  in  one  of  three 
forms:  tasking,  communication,  and  synchronization. 

Tasking  overhead  consists  of  the  work  necessary  to  per¬ 
form  the  tasking  functions  previously  described.  It  is  a 
<  function  of  the  number  of  independent  processes  in  the 
system  and  the  efficiency  of  the  run-time  environment. 
Communication  overhead  is  a  function  of  the  amount  of 
data  passed  between  processes  and  the  communications  ef¬ 
ficiency.  It  includes  the  overhead  inherent  in  the  program¬ 
ming  language  (e.^.,  in  Ada,  subprogram  invocation,  task 
rendezvous,  task  activation  and  termination,  and  data  ref¬ 
erence  and  modification),  and  the  computational  cost  and 
time  delay  incurred  by  physical  message  passing.  Syn¬ 
chronization  overhead  is  a  function  of  the  number  of  times 
individual  processes  must  suspend  operations  to  commu¬ 
nicate  with  other  processes  and  the  amount  of  time  in  that 

suspended  state.  The  efficiency  of  the  load  balancing  has 
the  greatest  effect  on  this  type  of  overhead. 

The  second  limiting  factor  in  parallel  speedup  is  the 
percentage  of  sequential  operations  in  the  problem  that 
cannot  be  parallelized  in  any  manner.  These  operations 
must  be  performed  on  a  single  processor  while  others  are 
held  idle,  reducing  the  efficiency  of  the  overall  algorithm. 

The  final  limiting  factor  is  data  contention,  when  sev¬ 
eral  processors  require  access  to  the  same  global  data  el¬ 
ement  at  the  same  time.  On  a  shared  memory  machine, 
processes  queue  up  to  obtain  the  data,  resulting  in  a  seri¬ 
alization  of  the  parallel  tasks.  On  a  loosely  coupled  sys¬ 
tem,  data  contention  results  in  increased  message  passing 
.  and  increases  the  possibility  of  multiple  copies  of  the  same 
global  variable  existing  in  different  states.  This  effect  is 
•  called  software  lockout  and  must  be  minimized  whenever 
possible. 

The  Kalman  filter  tracking  algorithm  was  decomposed 
on  two  levels.  Relaxation  was  used  at  the  highest  level  to 
separate  the  individual  filters  in  the  Kalman  filter  bank 
and  to  separate  the  functions  of  the  simulator.  At  a  lower 
level,  partitioning  was  used  for  some  of  the  complex  ma¬ 
trix  operations  required  by  the  algorithm  including  matrix 
multiplication  and  Cholesky  decomposition.  These  oper¬ 
ations  were  often  performed  on  large  data  sets,  justifying 
the  added  overhead  of  parallelization. 

2.4  Software  Design  Methodology 

Parallel  programming  and  the  Ada  language  place  certain 
demands  on  a  software  design  method.  The  method  must 
include  a  means  of  describing  concurrent  processes  and 
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the  communication  between  them.  In  addition,  it  must 
support  the  software  engineering  features  of  Ada  includ¬ 
ing  packages,  generics,  and  advanced  data  structures.  Fi¬ 
nally,  it  should  support  software  pre-partitioning  using 
the  task  construct.  One  method  specifically  developed 
for  designing  large,  real-time  distributed  systems  in  Ada 
IS  the  Layered  Virtual  Machine/Object- Oriented  Design 
(LVM/OOD)  method  [12].  It  provides  capabilities  for  de¬ 
scribing  concurrent  processes  and  the  communication  be¬ 
tween  them  and  also  supports  real-time  considerations.  Its 
basis  for  concurrent  processes  is  the  Ada  task,  groups  of 
which  are  encapsulated  into  packages. 

Good  software  design  requires  the  successful  integra¬ 
tion  of  algorithms  and  data  structures.  Both  of  these  com¬ 
ponents  are  of  equal  importance,  but  they  often  raise  dif¬ 
fering  concerns.  LVM/OOD  deals  with  these  concerns  by 
combining  two  design  concepts:  layered  virtual  machines 
and  object  oriented  design  (OOD). 

Layered  virtual  machines  are  abstractions  of  algorithms 
into  independent  processes  capable  of  operating  in  parallel. 
These  processes  are  virtual  because  they  are  not  associated 
with  underlying  hardware.  The  run-time  environment  is 
responsible  for  the  binding  of  processes  to  processors.  The 
use  of  layering  creates  a  hierarchical  set  of  modules  that 
defer  implementation  details  and  support  information  hid¬ 
ing. 

OOD  is  used  to  abstract  problem  space  objects  into 
software  data  structures  and  their  associated  operations. 
Object  abstractions  can  be  either  object  managers  which 
encapsulate  data  structures  or  type  managers  which  en¬ 
capsulate  definitions  that  form  templates  for  the  creation"  “ 
of  data  structures.  These  managers  provide  the  means  for 

describing  data  and  are  especially  important  in  data  driven 
designs. 

The  layered  virtual  machine  and  object  oriented  design 
concepts  are  combined  into  an  eight  step  design  methodol¬ 
ogy.  The  first  step  is  to  determine  the  hardware  interfaces 
to  the  control  system,  illustrated  with  a  context  diagram. 
Each  hardware  device  is  depicted  separately  and  the  inter¬ 
nal  control  system  is  represented  as  a  single  entity.  High- 
level  inputs  and  outputs  are  labeled  on  the  interfaces. 

In  the  second  step,  each  of  the  external  devices,  or  edge 
functions,  is  assigned  a  separate  process,  a  simple  device 
driver  with  the  bare  minimum  of  instructions.  Third,  the 
internal  controller  is  decomposed  into  its  primary  compo¬ 
nents  using  data  flow  diagrams  (DFDs).  Complex  compo- 

nents  can  be  further  decomposed  using  hierarchical  levels 
of  DFDs. 

The  fourth  step  is  to  determine  concurrency  among  the 
controller  components,  to  abstract  these  components  into 
processes  that  will  run  independently.  Concurrency  con¬ 
siderations  will  often  lead  to  grouping  components  into  a 
single  process.  Nielsen  and  Shumate  provide  several  rules 
to  assist  in  this.  Functional  cohesion  suggests  that  closely 
related  functions  can  be  grouped  into  a  single  process  to  re¬ 


duce  overhead,  Temporal  cohesion  collects  operations  that 
occur  during  the  same  time  period  or  after  the  same  events 
In  both  cases,  significant  overhead  can  often  be  saved  with' 
out  appreciable  loss  of  performance.  However,  some  funo 
tions  should  be  left  as  separate  processes.  Time-critical 
components  should  be  implemented  as  separate  processes. 
Periodic  functions  should  not  be  combined  with  opera^ 
tions  that  run  in  differing  periods.  Finallyf  background 
processes  should  also  be  separate  to  preclude  interference 
with  time-sensitive  ones  and  to  provide  for  the  best  use  of 
excess  processor  time. 

This  fourth  step  focuses  only  on  the  concurrency  of 
the  high-level  components  identified  in  the  system  DFDs. 
Nielsen  and  Shumate  recommend  against  having  more  than 
one  level  of  concurrent  tasks,  but  ignoring  lower  level  parti¬ 
tioning  may  forgo  performance  gains.  This  research  exam¬ 
ined  further  decomposition,  resulting  in  several  matrix  op¬ 
eration  routines  that  increased  overall  system  paralleliza- 
tion. 

The  fifth  step  is  to  determine  what  type  of  interfaces 
exist  between  the  processes.  These  interfaces  define  the 


communication  between  the  processes  and  can  be  one  of 
several  forms:  messages  where  a  reply  from  the  receivei 
IS  required,  data  transfers  where  no  reply  is  required,  sig¬ 
nals  used  to  coordinate  on  the  occurrence  of  certain  events, 
and  shared  data  access.  The  type  of  communication  deter¬ 
mines  the  amount  of  coupling  between  processes.  Messages 
that  require  replies  have  the  highest  coupling  followed  by 
simple  data  transfers.  The  least  coupling  is  caused  by 
_ event  signals.  Shared  data  access  requires  additional  at¬ 
tention  and  must  be  protected  by  some  intermediate  task 
to  provide  for  mutual  exclusion.  Once  these  interfaces  are 
identified,  processes  and  their  corresponding  interfaces  are 
depicted  using  a  process  structure  chart. 

Step  six  seeks  to  reduce  coupling  by  introducing  inter¬ 
mediary  processes  into  the  design.  First,  the  processes  are 
translated  into  Ada  tasks.  Between  tightly  coupled  tasks, 
additional  tasks  are  added  whose  sole  purpose  is  to  facili¬ 
tate  communication  and  thereby  decrease  coupling.  These 
intermediate  tasks  can  be  any  combination  of  the  three 
types  described  earlier.  The  result  of  this  step  is  the  Ada 
task  graph. 


Step  seven  is  to  encapsulate  the  tasks  into  packages. 
At  the  same  time,  the  objects  used  for  communication  be¬ 


tween  the  tasks  should  be  abstracted  into  data  objects  and 
encapsulated  into  packages  as  well.  Nielsen  and  Shumate 


suggest  that  these  be  placed  in  packages  to  increase  mod- 
ulaiity,  portability,  and  reusability.  They  also  provide  sev¬ 
eral  rules  for  task  encapsulation.  Tasks  should  be  grouped 
into  the  same  package  if  they  have  similar  functions  or  if 
their  general  nature  makes  them  good  candidates  for  reuse. 
Coupling  between  packages  should  be  minimized  with  re¬ 


spect  to  data  types,  operators,  and  constants  by  localizing 
these  items  to  the  package  or  group  of  packages  in  which 
t  ey  are  actually  used.  Finally,  the  package  should  inini- 
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mize  the  visibility  of  task  entries  to  that  which  is  essential 
for  package  interfacing.  The  products  of  this  step  are  an 
Ada  package  graph  and  the  corresponding  OOD  diagrams 

The  final  step  is  the  further  decomposition  of  large 
tasks.  This  may  result  in  another  level  of  concurrency 
which  can  be  depicted  using  process  structure  charts  and 
Ada  task  graphs,  or  it  may  result  in  a  sequential  decompo¬ 
sition  which  is  described  with  structure  charts.  The  results 
of  this  step  are  shown  using  OOD  diagrams. 

The  complete  methodology  was  used  to  design  the  Kal¬ 
man  filter  tracking  system.  Interested  readers  are  referred 
to  [7]  for  the  results. 

3  System  Implementation  and 
Testing 

This  section  describes  the  process  used  to  implement  the 
Kalman  filter  tracking  system  from  the  design  developed 
using  LVM/OOD.  A  top-down  approach  y/as  used  to  build 
a  skeleton  of  the  system  to  outline  and  test  system  inter¬ 
faces,  These  interfaces  were  used  as  a  basis  for  a  bottom- 
up  construction  of  the  final  system.  Testing  was  com¬ 
pletely  integrated  with  the  implementation,  occurring  at 
the  end  of  each  phase  of  the  development. 

The  first  phase  of  the  top-down  approach  began  with 
the  coding  of  the  system  packages.  Global  type  descrip¬ 
tions  were  declared  in  the  package  specification  of  the 
package  in  which  they  were  first  referenced,  and  simula¬ 
tion  model  constants  were  declared  in  a  master  package. 
With  these  high  level  data  descriptions  in  place,  a  skeleton 
of  the  complete  system  was  developed  using  the  top-level 
tasks.  Each  task  declaration  was  complete  with  all  of  its 
entries  and  their  corresponding  parameters.  The  task  bod¬ 
ies  were  coded  as  shells  containing  only  the  entry  calls  and 
accept  blocks  from  the  final  implementation.  An  output 
statement  was  included  before  and  after  each  entry  call 
and  g^c^t  block  describing  the  current  state  of  the  task. 
All  variables  local  to  the  task  bodies  were  declared  and 
those  used  for  communication  parameters  were  initialized 
to  zero. 

Once  the  coding  was  complete,  the  skeleton  was  com¬ 
piled  using  the  already  established  order  of  compilation 
based  on  the  package  dependencies  identified  in  the  de¬ 
sign.  This  provided  a  first  check  of  package  dependence, 
Checked  the  actual  parameters  in  the  entry  calls  against 
the  formal  parameters  in  the  accept  statements,  and  indi¬ 
cated  any  improper  parameter  modes. 

Once  compiled,  the  skeleton  system  was  executed  on  a 
single  processor.  This  caused  the  elaboration  of  all  of  the 
task  and  object  declarations  and  activation  of  all  the  tasks. 
It  also  checked  the  system  for  possible  deadlock  conditions. 
Although  no  data  was  processed  or  passed,  all  rendezvous 
took  place  and  were  documented  by  the  output  of  the  task 
state  descriptions.  The  resulting  record  of  system  oper¬ 


ation  provided  clues  as  to  the  state  of  the  system  at  the 
point  at  which  deadlock  occurred.  This  information  was 
useful  in  reordering  entry  calls  to  eliminate  the  deadlock 
problem. 

Despite  the  usefulness  of  the  system  state  output  tech¬ 
nique,  it  does  have  limitations.  There  is  no  guarantee  of 
what  an  output  device  driver  will  do  when  a  concurrent 
system  deadlocks.  Messages  may  be  lost  and  the  resulting 
state  record  may  not  be  complete.  Even  a  deadlock-free 
system  at  this  point  does  not  guarantee  continued  cor¬ 
rect  operation  once  the  skeleton  has  been  filled  in  with  all 
the  required  processing.  Because  of  the  non- deterministic 
nature  of  the  tasking  model,  additional  processing  load 
may  lead  to  a  different  task  scheduling  order  which  may 
present  opportunities  for  deadlock  that  did  not  exist  in  the 
skeleton.  Only  careful  analysis  of  the  system  design  and. 
the  results  of  tests  can  lead  to  a  dead-lock  free  system. 
Nonetheless,  until  better  tools  for  following  tasking  flow  of 
control  are  made  available,  the  system  state  description  log 

is  one  of  the  best  methods  of  finding  concurrent  software 
communication  errors. 

The  second  stage  of  the  top-down  part  of  the  devel-‘ 
opment  involved  adding  the  subprogram  specifications  to 
their  package  specifications  and  subprogram  stubs  to  the 
corresponding  package  bodies,  and  recompiling.  Package 
dependencies  were  checked  again,  and  the  formal  and  ac¬ 
tual  parameters  in  the  subprogram  specifications  and  calls 
were  compared  for  type  mismatches.  The  skeleton  system 
was  executed  again  to  force  elaboration  of  the  subprogram 
local  variables. 

The  completed  skeleton  provided  a  framework  for  fur¬ 
ther  development  by  documenting  the  interfaces  between 
the  high-level  processes.  This  knowledge  formed  a  basis 
for  further  development  of  the  system  from  the  bottom 
up.  The  first  step  was  the  coding  of  several  reusable  ma¬ 
trix  operation  routines.  Because  these  operated  on  large 
matrices,  they  were  paxallelized  to  provide  better  response 
times.  A  single,  shared  copy  of  the  data  was  used  to 
eliminate  rendezvous  overhead,  and  the  number  of  tasks 
spawned  to  complete  the  operation  was  varied  dynami¬ 
cally,!  based  on  the  size  of  the  particular  matrix. 

Using  the  matrix  routines  as  atomic  operations,  the 
rest  of  the  system  was  developed  from  the  lowest  level 
subroutines  upward.  Difficulties  in  debugging  were  caused 
by  the  fact  that  the  Encore  run-time  system  would  not 
propagate  unhandled  exceptions  out  of  tasks.  When  an 
error  occurred  within  a  task  without  an  exception  handler, 
the  run-  time  system  would  simply  hang  without  an  error 
message.  Often  this  required  moving  the  code  to  another* 
development  environment  in  order  to  determine  the  error. 
The  portability  of  Ada  was  a  real  advantage  here,  as  the 
entire  Kalman  filter  tracking  system  could  be  ported  with- 
minimal  changes  to  the  code. 
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4  Conclusions 

One  objective  of  this  research  was  to  examine  the  soft¬ 
ware  engineering  issues  surrounding  the  use  of  Ada  for 
concurrent  software  systems.  The  implementation  of  the 
Kalman  filter  tracking  system  proved  that  the  task  is  a 
.viable  construct  for  parallel  process  partitioning.  In  ad¬ 
dition,  it  proved  that  careful  problem  decomposition  will 
result  in  more  than  adequate  computational  load  to  out¬ 
weigh  the  inherent  overhead  of  the  tasking  model  and  the 
Ada  run-time  system.  The  use  of  intermediary  processes 
proved  very  useful  in  reducing  synchronization  overhead 
and,  thereby,  increasing  the  parallel  efficiency  of  the  sys¬ 
tem. 

Nielsen  and  Shumate’s  [12]  design  methodology  proved 
to  be  an  effective  means  of  documenting  a  parallel  Ada  de¬ 
sign.  It  provides  good  rule-based  decision  making  support- 
at  both  the  system  and  detailed  design  levels.  The  graphi¬ 
cal  tools  are  adequate  to  describe  the  state  of  the  system  at 
each  stage  of  development,  and  each  level  of  problem  ab¬ 
straction  is  supported.  The  method  does  tend  to  be  more 
functionally  oriented  in  some  stages  (neglecting  the  OOD 
portion  of  the  methodology),  and  it  was  supplemented  in 
those  areas  by  material  from  [ij. 

Two  major  areas  of  difficulty  were  encountered  in  the 
design  phase  of  the  project.  The  first  was  the  discovery  in 
the  implementation  phase  that  the  one  part  of  the  algo¬ 
rithm  was  a  significant  roadblock  to  parallel  speed-up  with 
the  tracking  system.  Although  some  problem  was  expected 
based  on  th^  results  of  the  initial  decomposition,  the  mag¬ 
nitude  of  the  bottleneck  was  unexpected.  Clearly,  it  would 
have  been  better  to  recognize  the  severity  of  the  problem 
earlier  in  the  design  process  or  even  in  the  initial  anal¬ 
ysis.  While  the  factors  limiting  decomposition  efficiency 
were  discussed  early  in  the  guidelines,  no  real  method  was 
available  for  discovering  the  relative  magnitude  of  these 
factors. 

In  a  parallel  environment,  such  a  method  cannot  be 
based  solely  on  standard  algorithmic  complexity  analy¬ 
sis.  It  must  include  knowledge  of  actual  module  execution 
speeds  and  actual  time  delays  for  task  allocation,  schedul¬ 
ing,  and  communication.  Whether  such  information  could 
be  determined  during  analysis  and  design,  without  some 
degree  of  code  test  cases,  is  uncertain.  Also,  the  avail¬ 
ability  of  such  information  presupposes  knowledge  of  the 
specific  implementation  hardware  which  may  not  be  avail¬ 
able  in  the  early  stages  of  the  design.  The  form  of  such  a 
complexity  analysis  method,  as  well  as  where  in  the  devel¬ 
opment  cycle  it  should  fit,  are  areas  which  require  further 
study. 

A  second  limitation  to  the  guidelines  was  the  lack  of 
a  method  for  specifying  independent  process  run-time  in¬ 
teraction  graphically.  This  would  have  been  very  help¬ 
ful  in  identifying  possible  deadlock  situations  and  would 
have  provided  a  means  of  showing  data  dependency  among 


tasks  which  would  have  helped  in  analyzing  task  coupling. 
A  very  complex  problem,  involving  many  layers  of  tasks, I 
would  be  impossible  to  comprehend  without  some  graph-' 
ical  display  of  task  interaction.  An  automated  graphics 
tool  would  be  very  helpful  in  this  area. 

The  second  objective  of  this  effort  was  to  determine 
the  adequacy  of  the  Ada  language  for  parallel  software 
development  through  the  implementation  of  a  real-world 
problem-  The  implementation  of  the  Kalman  filter  track¬ 
ing  system  highlighted  a  very  important  distinction  be¬ 
tween  the  adequacy  of  the  language  itself  (as  described  by 
MIL-STD  1815 A)  and  the  adequacy  of  the  tools  (compil¬ 
ers,  debuggers,  run-time  systems,  etc)  currently  available 
to  support  it. 

The  Ada  language  proved  to  be  an  excellent  means 
of  abstracting  a  parallel  problem.  The  task  construct  is 
ideal  for  encapsulating  independent  processes,  while  the 
rendezvous  provides  the  means  for  both  task  synchroniza¬ 
tion  and  communication  without  resorting  to  machine  de¬ 
pendent  parallel  language  constructs.  The  ability  to  dy¬ 
namically  spawn  tasks  was  useful  as  a  load  balancing  tool. 

It  made  it  possible  for  a  generalized  routine  to  vary  the 
number  of  tasks  spawned  based  on  the  size  of  the  particu¬ 
lar  data  structure  being  operated  on. 

Aside  from  the  task  construct,  other  features  of  the 
language  made  implementation  more  productive  as  well. 
The  use  of  the  package  construct  greatly  assisted  in  mod¬ 
ular  development.  The  use  of  generics  and  unconstrained 
arrays  made  it  possible  to  construct  general  support  rou¬ 
tines  capable  of  operating  on  a  wide  range  of  data  forms. 
Finally,  the  standardization  of  the  language  was  itself  an 
advantage.  Because  of  the  difficulties  encountered  with 
the  debugging  tools,  four  different  Ada  environments  res¬ 
ident  on  four  different  hardware  architectures  were  used 
during  the  development  process.  The  use  of  Ada  made 
it  possible  to  move  modules  of  code  freely  between  these 
environments  on  a  frequent  basis  with  a  bare  minimum  of 
changes. 

While  the  Ada  language  provided  ideal  support  for  the 
implementation  of  the  system,  the  tools  available  to  sup¬ 
port  it  still  have  a  great  deal  of  maturing  to  do.  Although 
the  compiler  used  on  the  Encore  was  validated,  it  soon  be¬ 
came  obvious  that  development  in  a  parallel  environment 
is  as  dependent  on  the  correct  operation  of  the  run-time 
system  as  it  is  on  the  compiler  generating  valid  executable 
code.  Several  problems  were  encountered  with  the  Encore 
run-time  system  in  the  areas  of  stack  checking,  exception 
propagation,  and  task  allocation. 

Because  the  Ada  language  standard  does  not  cover  run¬ 
time  system  implementation,  there  is  no  validation  capa¬ 
bility  available  for  run-time  system  operation.  Therefore,  - 
as  was  the  case  with  previous  languages,  the  user  is  de¬ 
pendent  on  vender  testing  to  ensure  a  valid  system.  A 
method  of  central  validation  was  one  of  the  major  reasons 
Ada  was  developed,  and  serious  consideration  should  be 
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given  to  adding  run-time  system  standards  to  the  current 
0  language  standard. 

The  development  tools  available  for  this  project  were 
also  inadequate.  The  compilers  used  were  slow,  especially 
toward  the  end  of  the  project  when  long  lists  of  depen¬ 
dent  packages  had  to  be  recompiled  because  of  minor  code 
changes.  The  symbolic  debugger  wcis  next  to  useless  in 
a  multitasking  environment  and  did  not  support  concur¬ 
rent  multitasking  at  all.  There  were  also  no  tools  available 
to  monitor  the  execution  of  the  parallel  processes,  making 
task  analysis  very  difficult.  Finally,  the  documentation 
available  on  the  concurrent  aspects  of  the  run-time  system 
was  very  sparse. 

Parallel  processing  and  the  Ada  language  do  hold  great 
promise  for  developing  complex  real-time  and  ground-based 
systems.  The  successful  implementation  of  this  project 
proves  that  concurrent  Ada  is  a  reality  and  provides  a 
number  of  advantages  not  found  in  other  languages.  How¬ 
ever,  much  research  remains  to  be  done  in  this  area.  Soft¬ 
ware  engineering  methods  must  be  updated  to  meet  the 
new  challenges  of  concurrent  Ada  development.  Also,  ad¬ 
vances  are  needed  in  Ada  tools  and  automated  parallel 
design  tools  before  any  software  project  the  size  of  that 
required  by  SDI  can  be  attempted. 
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